Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

نویسندگان

چکیده

We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves. To incorporate function approximation, we consider a family of where the reward and transition kernel possess linear structure. Both offline online settings problems are considered. In setting, control both players aim to find Nash equilibrium by minimizing duality gap. single player playing against an arbitrary opponent minimize regret. For settings, propose optimistic variant least-squares minimax value iteration algorithm. show that our algorithm is computationally achieves [Formula: see text] upper bound on gap regret, d dimension, H horizon T total number timesteps. Our results do not require additional assumptions sampling model. setting requires overcoming several new challenges absent in decision processes or turn-based games. particular, achieve optimism moves, construct lower confidence bounds function, then compute policy solving general-sum matrix game these as payoff matrices. As finding hard, instead solves coarse correlated (CCE), which can be obtained efficiently. best knowledge, such CCE-based scheme has appeared literature might interest its own right. Funding: Q. Xie partially supported National Science Foundation [Grant CNS-1955997] J.P. Morgan. Y. Chen [Grants CCF-1657420, CCF-1704828, CCF-2047910]. Z. Wang acknowledges 2048075, 2008827, 2015568, 1934931], Simons Institute (Theory Reinforcement Learning), Amazon, Morgan, Two Sigma their support.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Value Function Approximation in Zero-Sum Markov Games

This paper investigates value function approximation in the context of zero-sum Markov games, which can be viewed as a generalization of the Markov decision process (MDP) framework to the two-agent case. We generalize error bounds from MDPs to Markov games and describe generalizations of reinforcement learning algorithms to Markov games. We present a generalization of the optimal stopping probl...

متن کامل

Sampling Techniques for Markov Games Approximation Results on Sampling Techniques for Zero-sum, Discounted Markov Games

We extend the “policy rollout” sampling technique for Markov decision processes to Markov games, and provide an approximation result guaranteeing that the resulting sampling-based policy is closer to the Nash equilibrium than the underlying base policy. This improvement is achieved with an amount of sampling that is independent of the state-space size. We base our approximation result on a more...

متن کامل

Flow Control Using the Theory of Zero Sum Markov Games

We consider the problem of dynamic ow control of arriving packets into an innnite buuer. The service rate may depend on the state of the system, may change in time and is unknown to the controller. The goal of the controller is to design an eecient policy which guarantees the best performance under the worst service conditions. The cost is composed of a holding cost, a cost for rejecting custom...

متن کامل

Learning in Zero-Sum Team Markov Games Using Factored Value Functions

We present a new method for learning good strategies in zero-sum Markov games in which each side is composed of multiple agents collaborating against an opposing team of agents. Our method requires full observability and communication during learning, but the learned policies can be executed in a distributed manner. The value function is represented as a factored linear architecture and its str...

متن کامل

Sampling Techniques for Zero-sum, Discounted Markov Games

In this paper, we first present a key approximation result for zero-sum, discounted Markov games, providing bounds on the state-wise loss and the loss in the sup norm resulting from using approximate Q-functions. Then we extend the policy rollout technique for MDPs to Markov games. Using our key approximation result, we prove that, under certain conditions, the rollout technique gives rise to a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics of Operations Research

سال: 2023

ISSN: ['0364-765X', '1526-5471']

DOI: https://doi.org/10.1287/moor.2022.1268